Quantifying biodiversity starts with a sample (or multiple samples) of a biological assemblage. The size (e.g., area or volume) of the sample is often referred to as the sample grain. For example, we might use a 1m2 quadrat to sample the benthic community on a coral reef.
Or use a line intercept transect of a set length (e.g., 50m).
Then, choose a metric…but, there are lots to choose from!
The proliferation of metrics is likely due to the multicomponent nature of biodiversity. Different metrics describe (incorporate) three different components - abundance, richness, and evenness - to varying degrees (McGill 2011b). For diversity considered at only a single scale only, changes in these three components combine to determine variation in biodiversity. We want to choose metrics that allow us to understand and accurately describe how numbers of individuals, as well as the richness and relative abundance of species covary.
library(tidyverse)
library(mobsim)
library(mobr)
library(cowplot)
# set a RNG seed
set.seed(42)
# simulate a sample with 20 species and 200 individuals, and
# a lognormal SAD
sim_poisson_comm <- sim_poisson_community(s_pool = 20,
n_sim = 200,
sad_type = 'lnorm',
sad_coef = list('meanlog' = log(200/20),
'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm1 <- sim_poisson_comm$census %>%
as_tibble()
# need counts of each species for Species Abundance Distribution (and plotting
# rarefaction curve)
comm1_long <- comm1 %>%
group_by(species) %>%
summarise(N = n())
# calculate IBR, and put it in a tibble (dataframe) for plotting
comm1_ibr <- tibble(expected_richness = rarefaction(comm1_long$N, method = 'IBR')) %>%
mutate(individuals = 1:n())
# calculate the Probability of Interspecific Encounter (PIE)
comm1_PIE <- tibble(PIE = calc_div(comm1_long$N, index = 'PIE'))
# plot the sample and the IBR
comm1_map <- ggplot() +
geom_point(data = comm1,
aes(x = x, y = y, colour = species)) +
scale_color_viridis_d() +
theme_minimal() +
theme(legend.position = 'none',
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
panel.border = element_rect(fill = NA, colour = '#ff0000'))
comm1_ibr_plot <- ggplot() +
geom_line(data = comm1_ibr,
aes(x = individuals, y = expected_richness),
linewidth = 1.1) +
geom_point(aes(x = 1, y = 1),
size = 2) +
geom_text(aes(x = 35, y = 1, label = '{J = 1, S = 1}'),) +
geom_point(data = comm1_ibr,
aes(x = max(individuals), y = max(expected_richness)),
size = 2) +
geom_text(data = comm1_ibr %>%
# filter so as we only have the last row
filter(individuals==max(individuals)),
aes(x = individuals - 35, y = expected_richness - 3,
label = paste0('{J = ', max(individuals), ', S = ',
max(expected_richness), '}'))) +
# add a line to show PIE; I've got it going through the origin here,
# but it is the slope of the IBR as we move from one individual to two
geom_abline(data = comm1_PIE,
aes(intercept = 0, slope = PIE),
linewidth = 0.75, alpha = 0.5, linetype = 2) +
geom_text(data = comm1_PIE,
aes(x = 35, y = 17, label = paste0('PIE = ', round(PIE,2)))) +
labs(x = 'Total number of individuals (J)',
y = 'Expected species richness (S)') +
theme_bw()
plot_grid(comm1_map,
comm1_ibr_plot,
nrow = 1)
We can visualise the key components of individuals, richness and evenness using the individual-based rarefaction curve. Numbers of individuals are on the x-axis, and the number of species on the y-axis. Evenness, or the relative abundance of species, is described by the shape of the curve. In particular, the slope at the base of the rarefaction curve, as it moves from one individual to two, is equal to the Probability of Interspecific Encounter (PIE;(Hurlbert 1971; Olszewski 2004). In practice, PIE is often transformed to an Effective Number of Species (ENS) to aid interpretation (L. Jost 2006). The ENS transformation of PIE, which we refer to as \(S_{PIE}\), is equal to the inverse of Simpson’s index (\(S_{PIE} = 1/\sum_{i=1}^{S} p_i^2\)), where \(p_i\) is the relative abundance of species i. \(S_{PIE}\) is also equal to Hill number of diversity index with order equal to two (Hill 1973; Lou Jost 2007).
Simulate some data to investigate covariation in abundance, evenness and richness.
#--------set parameters for community abundance, richness and evenness-----
community_params <- expand_grid(
# number of species
Spool = c(10, 40, 80),
# total number of individuals
J = c(50, 500),
# evenness of species relative (log) abundance
# smaller values result in more even distributions
# of species relative abundance
sdlog = c(0.8,2)) %>%
mutate(
# species mean (log) abundance
meanlog = log(J/Spool)
) %>%
# create identifier
mutate(region = 1:n())
#--------simulate communities-------------------
communities <- community_params %>%
group_by(region) %>%
nest(data = c(Spool, J, sdlog, meanlog)) %>%
# create some replicates (multiple samples with the same parameters)
uncount(20, .remove = FALSE) %>%
ungroup() %>%
mutate(sim = 1:n()) %>%
# generate sample with individuals randomly distributed in space
# i.e., spatial sampling is poisson process
mutate(poisson_comm = map(data,
~sim_poisson_community(
s_pool = .x$Spool,
n_sim = .x$J,
sad_type = 'lnorm',
sad_coef = list('meanlog' = .x$meanlog,
'sdlog'= .x$sdlog)))) %>%
# prepare stem map for visualisation
mutate(poisson_stem_map = map(poisson_comm,
~tibble(x = .x$census$x,
y = .x$census$y,
species = .x$census$species))) %>%
# convert sample to SAD for further calculations
mutate(poisson_sad = map(poisson_comm, ~community_to_sad(.x)),
# to 2d
poisson_sad = map(poisson_sad, ~tibble(species = names(.x),
N = as.numeric(.x)))) %>%
# calculate IBR
mutate(poisson_ibr = map(poisson_sad, ~rarefaction(x = .x$N,
method = 'IBR')),
# wrangle for plotting
poisson_ibr = map(poisson_ibr, ~tibble(individuals = 1:length(.x),
expected_richness = as.numeric(.x))))
# examine abundance, richness and evenness for the different communities
metrics <- communities %>%
select(sim, region, data, poisson_sad) %>%
unnest(c(poisson_sad)) %>%
group_by(sim, region) %>%
summarise(J = sum(N),
S = calc_div(N, index = 'S'),
PIE = calc_div(N, index = 'PIE'),
S_PIE = calc_div(N, index = 'S_PIE')) %>%
ungroup() %>%
# put the known parameters back in for visualisation
left_join(community_params)
# plot ibr
communities %>%
unnest(c(data, poisson_ibr)) %>%
select(sim, region, individuals, expected_richness) %>%
mutate(spatial_dist = 'random') %>%
left_join(community_params, by = 'region') %>%
mutate(region_label = paste0('S = ',Spool,
', J = ', J,
', sd = ', sdlog)) %>%
ggplot() +
facet_wrap(~region_label, scales = 'free') +
geom_line(aes(x = individuals, y = expected_richness,
group = sim)) +
labs(y = 'Expected number of species',
x = 'Number of individuals') +
theme(legend.position = 'none')
# plot metrics
ggplot() +
geom_boxplot(data = metrics,
aes(x = as.factor(J), y = S,
group = region, colour = as.factor(sdlog))) +
scale_colour_manual(name = 'Evenness',
values = c('0.8' = '#ffa600',
'2' = '#003f5c'),
labels = c('0.8' = 'more even',
'2' = 'less even')) +
labs(y = 'Species richness',
x = 'Number of individuals')
Unfortunately, our simplification to three components is only useful for biodiversity accounting at one scale. The shape of the rarefaction curve is again a useful, albeit limited, abstraction of the difficulty ahead: biodiversity scales non-linearly (Arrhenius 1921; Rosenzweig 1995). The individual-based rarefaction curve is often limited because with sufficient sampling effort on the x-axis, the curve cab bend upwards again, e.g., the Species Area Relationship (SAR) can have bend upwards at very large spatial scales (Rosenzweig 1995), sometimes referred to as a triphasic SAR. The nonlinear scaling of biodiversity is well known, but not necessarily well understood, particularly the potential for non-linear scaling to impact estimates of biodiversity change at different scales. One way to approach the non-linear scaling is via autocorrelation or within species aggregation (McGill 2011a).
Here we will approach the scale-dependence of biodiversity using a diversity partition introduced by Whittaker (1960) that considers two discrete spatial scales and another to relate the two (Chase et al. 2018). Specifically, the larger (sometimes referred to as the regional) scale is often called the gamma (\(\gamma\)) scale, and the smaller (local) scale is called the alpha (\(\alpha\)) scale (and the mean \(\alpha\)-diversity is \(\bar{\alpha}\)). Whittaker’s diversity partition assumes a multiplicative relationship between the two scales:
\[\gamma = \bar{\alpha} * \beta,\] where \(\beta\)-diversity describes the variation among samples in species composition.
By making scale discrete, we can continue to use individual-based rarefaction curves to visualise and describe biodiversity. We’ll consider each sample as representing the \(\alpha\)-scale. And then combine multiple samples to create the \(\gamma\)-scale. The area encompassed by all samples combined is often referred to as the sample extent.
# simulate a community with 200 species and 5000 individuals, and
# a lognormal SAD, and individuals randomly distributed in space
Spool <- 200
Jpool <- 5000
sim_poisson_comm <- sim_poisson_community(s_pool = Spool,
n_sim = Jpool,
sad_type = 'lnorm',
sad_coef = list('meanlog' = log(Spool/Jpool),
'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm1 <- sim_poisson_comm$census %>%
as_tibble()
# simulate quadrat sampling from the community
quad_area <- 0.01
samps <- mobsim::sample_quadrats(comm = sim_poisson_comm,
n_quadrats = 5,
quadrat_area = quad_area,
method = 'random',
avoid_overlap = FALSE,
plot = FALSE)
quad_xy <- samps$xy_dat %>%
as_tibble() %>%
rownames_to_column(var = 'site')
# plot samples
plot_samples <- ggplot() +
geom_point(data = comm1,
aes(x = x, y = y, colour = species)) +
geom_rect(data = quad_xy,
aes(xmin = x, ymin = y,
xmax = x + sqrt(quad_area),
ymax = y + sqrt(quad_area)),
fill = alpha('grey',0),
colour = '#ff0000',
linewidth = 1.1) +
scale_color_viridis_d() +
theme_minimal() +
theme(legend.position = 'none',
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
panel.border = element_rect(fill = NA, colour = 'black'))
# need counts of each species for Species Abundance Distribution (and plotting
# rarefaction curve). Our samples are in a site x species matrix (sometimes
# called wide data [cf. long data]).
alpha_long <- samps$spec_dat %>%
as_tibble() %>%
rownames_to_column(var = 'site') %>%
pivot_longer(cols = !site,
names_to = 'species',
values_to = 'N')
# calculate IBR, and put it in a tibble (dataframe) for plotting
alpha_ibr <- alpha_long %>%
group_by(site) %>%
nest(data = c(species, N)) %>%
mutate(ibr = map(data, ~rarefaction(.x$N, method = 'IBR'))) %>%
unnest(ibr) %>%
mutate(individuals = 1:n()) %>%
ungroup()
# to get the gamma-scale curve, we need to aggregate the species counts
# across species
gamma_long <- alpha_long %>%
group_by(species) %>%
summarise(N = sum(N)) %>%
ungroup()
gamma_ibr <- tibble(gamma_richness = rarefaction(gamma_long$N, method = 'IBR')) %>%
mutate(individuals = 1:n())
# plot rarefaction curves
ibr_2scales <- ggplot() +
geom_line(data = alpha_ibr,
aes(x = individuals, y = ibr, group = site, colour = 'alpha')) +
geom_line(data = gamma_ibr,
aes(x = individuals, y = gamma_richness, colour = 'gamma')) +
scale_colour_manual(name = 'Scale',
values = c('alpha' = '#025c00',
'gamma' = '#ffc619'),
labels = c(expression(alpha),
expression(gamma))) +
labs(x = 'Number of individuals',
y = 'Expected number of species') +
theme_bw() +
theme(legend.position = c(0.99,0.01),
legend.justification = c(1,0))
plot_grid(plot_samples,
ibr_2scales,
nrow = 1)
We can use Whittaker’s diversity partition to calculate \(\beta\)-diversity for these data (i.e., \(\beta = \frac{\gamma}{\bar{\alpha}}\). Before doing so, what does it means when the \(\alpha\)-scale curves are so closely aligned with the \(\gamma\)-scale curve? The overlapping rarefaction curves means that the smaller (\(\alpha\)) scale samples are effectively a random subsample of the larger (\(\gamma\)) scale (Blowes, Belmaker, and Chase 2017; Chase et al. 2018). We know this to be true for these simulated data: the individuals of all species were distributed randomly in space (via a poisson point process model). Before diving into some beta-diversity metrics, let’s look at how within species aggregation changes what we see.
# simulate a community with 200 species and 5000 individuals, and
# a lognormal SAD (i.e., same as previous community), only now individuals will
# have an aggregated distribution in space
# Spool <- 200
# Jpool <- 5000
# see ?sim_thomas_community(): how is the spatial distribution of species
# controlled in this function?
sim_aggr_comm <- sim_thomas_community(s_pool = Spool,
n_sim = Jpool,
sad_type = 'lnorm',
sad_coef = list('meanlog' = log(Spool/Jpool),
'sdlog' = 1))
# get the census, which contains the spatial coordinates of every individual
comm2 <- sim_aggr_comm$census %>%
as_tibble()
# here, well use a quadrat to sample individuals from the community,
# set area of quadrat
quad_area <- 0.01
samps2 <- mobsim::sample_quadrats(comm = sim_aggr_comm,
n_quadrats = 5,
quadrat_area = quad_area,
# other spatial distributions for quadrats are available:
# grids and transects
method = 'random',
avoid_overlap = FALSE, # setting this to true breaks currently!
plot = FALSE)
# xy_dat has the xy coordinates for the lower left corner of every quadrat sample
quad_xy <- samps2$xy_dat %>%
as_tibble() %>%
rownames_to_column(var = 'site')
# plot the community and the location of the quadrat samples
plot_samples <- ggplot() +
geom_point(data = comm2,
aes(x = x, y = y, colour = species)) +
geom_rect(data = quad_xy,
aes(xmin = x, ymin = y,
xmax = x + sqrt(quad_area),
ymax = y + sqrt(quad_area)),
fill = alpha('grey',0),
colour = '#ff0000',
linewidth = 1.1) +
scale_color_viridis_d() +
theme_minimal() +
theme(legend.position = 'none',
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
panel.border = element_rect(fill = NA, colour = 'black'))
# We need counts of each species for Species Abundance Distribution (and plotting
# rarefaction curve). Our samples are in a site x species matrix (sometimes
# called wide data [cf. long data]).
# wide to long (for plotting)
alpha_long2 <- samps2$spec_dat %>%
as_tibble() %>%
rownames_to_column(var = 'site') %>%
pivot_longer(cols = !site,
names_to = 'species',
values_to = 'N')
# calculate IBR, and put it in a tibble (dataframe) for plotting
alpha_ibr2 <- alpha_long2 %>%
group_by(site) %>%
nest(data = c(species, N)) %>%
mutate(ibr = map(data, ~rarefaction(.x$N, method = 'IBR'))) %>%
unnest(ibr) %>%
mutate(individuals = 1:n()) %>%
ungroup()
# to get the gamma-scale curve, we need to aggregate the individual counts for
# each species
gamma_long2 <- alpha_long2 %>%
group_by(species) %>%
summarise(N = sum(N)) %>%
ungroup()
# calculate the gamma-scale IBR
gamma_ibr2 <- tibble(gamma_richness = rarefaction(gamma_long2$N, method = 'IBR')) %>%
mutate(individuals = 1:n())
# plot rarefaction curves (both scales)
ibr_2scales2 <- ggplot() +
geom_line(data = alpha_ibr2,
aes(x = individuals, y = ibr, group = site, colour = 'alpha')) +
geom_line(data = gamma_ibr2,
aes(x = individuals, y = gamma_richness, colour = 'gamma')) +
scale_colour_manual(name = 'Scale',
values = c('alpha' = '#025c00',
'gamma' = '#ffc619'),
labels = c(expression(alpha),
expression(gamma))) +
labs(x = 'Number of individuals',
y = 'Expected number of species') +
theme_bw() +
theme(legend.position = c(0.99,0.01),
legend.justification = c(1,0))
plot_grid(plot_samples,
ibr_2scales2,
nrow = 1)
Now, we see that the \(\gamma\)-scale curve does not fall directly on top of the \(\alpha\)-scale curves. This means that the smaller scale samples no longer capture a random subset of the species captured by all the samples combined. This is due to within species aggregation. Environmental conditions are heterogeneous in nature, and the conditions our samples document will typically be more similar when the samples are in close proximity to each other (in space &/or time). Because species often have idiosyncratic responses to environmental heterogeneity (species may have different niches (Chase and Leibold 2003) for example), and other processes that influence the spatial distribution of individuals (e.g., dispersal) likely vary among species (and at the very least are stochastic to some degree for individuals), most species exhibit some degree of within species aggregation. So, communities - and hopefully our samples that we take to represent them - likely fall somewhere between these two extremes (of closely overlapping accumulation curves at the \(\alpha\)- and \(\gamma\)-scales, and strongly diverging).
When we embrace the scale-dependent nature of biodiversity variation there are four components that we’d like our biodiversity metrics to accurately describe: abundance, evenness, richness, and within species aggregation (Chase et al. 2018; McGlinn et al. 2019). Changes in all of these components underpin variation in diversity. We stand to gain new insights with an accurate accounting of how they combine across scales.
Use mobsim to simulate sampling from some different communities. Vary community size (total abundance), the size of the species pool, evenness, and aggregation, but take the same number of samples from each community. Plot rarefaction curves and calculate some metrics to explore the relationship between different components at the \(\alpha-\) and \(\gamma\)-scales.